239 research outputs found

    Frobenius norm regularization for the multivariate von Misses distribution

    Get PDF
    Penalizing the model complexity is necessary to avoid overfittingwhen the number of data samples is low with respect to the number of model parameters. In this paper, we introduce a penalization term that places an independent prior distribution for each parameter of the multivariate von Mises distribution.We also propose a circular distance that can be used to estimate the Kullback–Leibler divergence between any two circular distributions as goodness-of-fit measure. We compare the resulting regularized von Mises models on synthetic data and real neuroanatomical data to show that the distribution fitted using the penalized estimator generally achieves better results than nonpenalized multivariate von Mises estimator

    Towards Gaussian Bayesian network fusion

    Get PDF
    Data sets are growing in complexity thanks to the increasing facilities we have nowadays to both generate and store data. This poses many challenges to machine learning that are leading to the proposal of new methods and paradigms, in order to be able to deal with what is nowadays referred to as Big Data. In this paper we propose a method for the aggregation of different Bayesian network structures that have been learned from separate data sets, as a first step towards mining data sets that need to be partitioned in an horizontal way, i.e. with respect to the instances, in order to be processed. Considerations that should be taken into account when dealing with this situation are discussed. Scalable learning of Bayesian networks is slowly emerging, and our method constitutes one of the first insights into Gaussian Bayesian network aggregation from different sources. Tested on synthetic data it obtains good results that surpass those from individual learning. Future research will be focused on expanding the method and testing more diverse data sets

    Multi-facet determination for clustering with Bayesian networks

    Get PDF
    Real world applications of sectors like industry, healthcare or finance usually generate data of high complexity that can be interpreted from different viewpoints. When clustering this type of data, a single set of clusters may not suffice, hence the necessity of methods that generate multiple clusterings that represent different perspectives. In this paper, we present a novel multi-partition clustering method that returns several interesting and non-redundant solutions, where each of them is a data partition with an associated facet of data. Each of these facets represents a subset of the original attributes that is selected using our information-theoretic criterion UMRMR. Our approach is based on an optimization procedure that takes advantage of the Bayesian network factorization to provide high quality solutions in a fraction of the time

    Mining multi-dimensional concept-drifting data streams using Bayesian network classifiers

    Get PDF
    In recent years, a plethora of approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most of these approaches can only be applied to uni-dimensional classification problems where each input instance has to be assigned to a single output class variable. The problem of mining multi-dimensional data streams, which includes multiple output class variables, is largely unexplored and only few streaming multi-dimensional approaches have been recently introduced. In this paper, we propose a novel adaptive method, named Locally Adaptive-MB-MBC (LA-MB-MBC), for mining streaming multi-dimensional data. To this end, we make use of multi-dimensional Bayesian network classifiers (MBCs) as models. Basically, LA-MB-MBC monitors the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a concept drift is detected, LA-MB-MBC adapts the current MBC network locally around each changed node. An experimental study carried out using synthetic multi-dimensional data streams shows the merits of the proposed method in terms of concept drift detection as well as classification performance

    Learning tractable multidimensional Bayesian network classifiers

    Get PDF
    Multidimensional classification has become one of the most relevant topics in view of the many domains that require a vector of class values to be assigned to a vector of given features. The popularity of multidimensional Bayesian network classifiers has increased in the last few years due to their expressive power and the existence of methods for learning different families of these models. The problem with this approach is that the computational cost of using the learned models is usually high, especially if there are a lot of class variables. Class-bridge decomposability means that the multidimensional classification problem can be divided into multiple subproblems for these models. In this paper, we prove that class-bridge decomposability can also be used to guarantee the tractability of the models. We also propose a strategy for efficiently bounding their inference complexity, providing a simple learning method with an order-based search that obtains tractable multidimensional Bayesian network classifiers. Experimental results show that our approach is competitive with other methods in the state of the art and ensures the tractability of the learned models

    Data publications correlate with citation impact

    Get PDF
    Neuroscience and molecular biology have been generating large atasets over the past years that are reshaping how research is being conducted.In their wake, open data sharing has been singled out as a major challenge for the future of research. We conducted a comparative study of citations of data publications in both fields, showing that the average publication tagged with a data-related term by the NCBI MeSH(MedicalSubjectHeadings) curators achieves a significantly larger citation impact than the average in either field. We introduce a new metric, the data article citation index(DAC-index), to identify the most prolific authors among those data-related publications.The study is fully reproducible from an executable Rmd(RMarkdown)script to gether with all the citation datasets. We hope these results can encourage authors to more openly publish their data

    Anomaly detection with a spatio-temporal tracking of the laser spot

    Get PDF
    Anomaly detection is an important problem with many applications in industry. This paper introduces a new methodology for detecting anomalies in a real laser heating surface process recorded with a high-speed thermal camera (1000 fps, 32×32 pixels). The system is trained with non-anomalous data only (32 videos with 21500 frames). The proposed method is built upon kernel density estimation and is capable of detecting anomalies in time-series data. The classification should be completed in-process, that is, within the cycle time of the workpiece

    Decision functions for chain classifiers based on Bayesian networks for multi-label classification

    Get PDF
    Multi-label classification problems require each instance to be assigned a subset of a defined set of labels. This problem is equivalent to finding a multi-valued decision function that predicts a vector of binary classes. In this paper we study the decision boundaries of two widely used approaches for building multi-label classifiers, when Bayesian networkaugmented naive Bayes classifiers are used as base models: Binary relevance method and chain classifiers. In particular extending previous single-label results to multi-label chain classifiers, we find polynomial expressions for the multi-valued decision functions associated with these methods. We prove upper boundings on the expressive power of both methods and we prove that chain classifiers provide a more expressive model than the binary relevance method

    Dynamic Bayesian network-based anomaly detection for in-process visual inspection of laser surface heat treatment

    Get PDF
    We present the application of a cyber-physical system for inprocess quality control based on the visual inspection of a laser surface heat treatment process. To do this, we propose a classification framework that detects anomalies in recorded video sequences that have been preprocessed using a clustering-based method for feature subset selection. One peculiarity of the classification task is that there are no examples with errors, since major irregularities seldom occur in efficient industrial processes. Additionally, the parts to be processed are expensive so the sample size is small. The proposed framework uses anomaly detection, cross-validation and sampling techniques to deal with these issues. Regarding anomaly detection, dynamic Bayesian networks (DBNs) are used to represent the temporal characteristics of the normal process. Experiments are conducted with two diferent types of DBN structure learning algorithms, and classification performance is assessed on both anomalyfree examples and sequences with anomalies simulated by experts

    Directional naive Bayes classifiers

    Get PDF
    Directional data are ubiquitous in science. These data have some special properties that rule out the use of classical statistics. Therefore, different distributions and statistics, such as the univariate von Mises and the multivariate von Mises–Fisher distributions, should be used to deal with this kind of information. We extend the naive Bayes classifier to the case where the conditional probability distributions of the predictive variables follow either of these distributions. We consider the simple scenario, where only directional predictive variables are used, and the hybrid case, where discrete, Gaussian and directional distributions are mixed. The classifier decision functions and their decision surfaces are studied at length. Artificial examples are used to illustrate the behavior of the classifiers. The proposed classifiers are then evaluated over eight datasets, showing competitive performances against other naive Bayes classifiers that use Gaussian distributions or discretization to manage directional data
    • …
    corecore